Fast Algorithms For String Matching With And Without Swaps
نویسنده
چکیده
Given a text string T of lenght n and a pattern string P of lenght m over some alphabet Σ, we want to find the occurrences of P ′ from T such that P ′ can be derived from P by set of local swaps, i.e. transpositions of two adjacent characters, each character swapping at most once. We give several simple but fast algorithms for the problem. The first algorithm is based on Boyer–Moore–Horspool approach. The second algorithm uses a nondeterministic finite automaton that is simulated using a shift–or type method. We improve the shift–or to take only time O(n/ log|Σ| sd(m + 1)/we), where s ≥ |Σ| is the space usage of the algorithm, and w is the lenght of the machine word. This algorithm is sublinear for small patterns and alphabets, and is asymptotically the fastest bit–parallel simulation of sufficiently simple nondeterministic finite automata. Finally, we show how bit–parallel suffix automaton can be used to solve the problem in optimal average time O(n log m/m), while being only O(ndm/we) in the worst case. The algorithms are very simple to implement, and experimental results show that they are very fast on natural language.
منابع مشابه
Approximate Swapped Matching
Let a text string T of n symbols and a pattern string P of m symbols from alphabet be given. A swapped version P 0 of P is a length m string derived from P by a series of local swaps, (i.e. p 0 ` p `+1 and p 0 `+1 p `) where each element can participate in no more than one swap. The Pattern Matching with Swaps problem is that of nding all locations i of T for which there exists a swapped versio...
متن کاملEfficient Special Cases of Pattern Matching with Swaps
Let a text string T of n symbols and a pattern string P of m symbols from alphabet be given. A swapped version T 0 of T is a length n string derived from T by a series of local swaps, (i.e. t 0 ` t `+1 and t 0 `+1 t `) where each element can participate in no more than one swap. The Pattern Matching with Swaps problem is that of nding all locations i for which there exists a swapped version T 0...
متن کاملEfficient Algorithms for Approximate String Matching with Swaps (Extended Abstract)
Most research on the edit distance problem and the k-differences problem considered the set of edit operations consisting of changes, insertions, and deletions. In this paper we include the swap operation that interchanges two adjacent characters into the set of allowable edit operations, and we present an O(t min(m, n))-time algorithm for the extended edit distance problem, where t is the edit...
متن کاملPattern Matching with Swaps
1 A preliminary version of this paper appeared in FOCS 97. Let a text string T of n symbols and a pattern string P of m symbols from alphabet be given. A swapped version T of T is a length n string derived from T by a series of local swaps (i.e., t ← t +1 and t +1 ← t), where each element can participate in no more than one swap. The pattern matching with swaps problem is that of finding all lo...
متن کاملPerformance Evaluation of Local Detectors in the Presence of Noise for Multi-Sensor Remote Sensing Image Matching
Automatic, efficient, accurate, and stable image matching is one of the most critical issues in remote sensing, photogrammetry, and machine vision. In recent decades, various algorithms have been proposed based on the feature-based framework, which concentrates on detecting and describing local features. Understanding the characteristics of different matching algorithms in various applications ...
متن کامل